Prioritize the ordering of URL queue in Focused crawler

نویسنده

چکیده مقاله:

The enormous growth of the World Wide Web in recent years has made it necessary to perform resource discovery efficiently. For a crawler it is not an simple task to download the domain specific web pages. This unfocused approach often shows undesired results. Therefore, several new ideas have been proposed, among them a key technique is focused crawling which is able to crawl particular topical portions of the World Wide Web quickly without having to explore all web pages. Focused crawling is a technique which is able to crawled particular topics quickly and efficiently without exploring all WebPages. The proposed approach does not only use keywords for the crawl, but rely on high-level background knowledge with concepts and relations, which are compared with the texts of the searched page. In this paper a combined crawling strategy is proposed that integrates the link analysis algorithm with association metric. An approach is followed to find out the relevant pages before the process of crawling and to prioritizing the URL queue for downloading higher relevant pages, to an optimal level based on domain dependent ontology. This strategy make use of ontology to estimate the semantic contents of the URL without exploring which in turn strengthen the ordering metric for URL queue and leads to the retrieval of most relevant pages.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

prioritize the ordering of url queue in focused crawler

the enormous growth of the world wide web in recent years has made it necessary to perform resource discovery efficiently. for a crawler it is not an simple task to download the domain specific web pages. this unfocused approach often shows undesired results. therefore, several new ideas have been proposed, among them a key technique is focused crawling which is able to crawl particular topical...

متن کامل

Multi-level Frontier based Topic-specific Crawler Design with Improved URL Ordering

The rapid growth of World Wide Web has urged the development of retrieval tools like search engines. Topic specific crawlers are best suited for the users looking for results on a particular subject. In this paper, a novel design of a topic specific web crawler based on multi-agent system is presented. The architecture proposed employs two types of agents: retrieval and coordinator agents. Coor...

متن کامل

Efficient Crawling Through URL Ordering

In this paper we study in what order a crawler should visit the URLs it has seen, in order to obtain more "important" pages first. Obtaining important pages rapidly can be very useful when a crawler cannot visit the entire Web in a reasonable amount of time. We define several importance metrics, ordering schemes, and performance evaluation measures for this problem. We also experimentally evalu...

متن کامل

YAFC: Yet Another Focused Crawler

As the Web continues to grow rapidly, focused topic-specific Web crawlers will gain popularity over traditional general-purpose search engines for locating, indexing and keeping up to date information on the Web. This paper presents YAFC (Yet Another Focused Crawler), a neurodynamic programming approach to focused crawling. YAFC combines TD(λ) reinforcement learning with a neural network to lea...

متن کامل

منابع من

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}


عنوان ژورنال

دوره 2  شماره 1

صفحات  25- 31

تاریخ انتشار 2014-06-01

با دنبال کردن یک ژورنال هنگامی که شماره جدید این ژورنال منتشر می شود به شما از طریق ایمیل اطلاع داده می شود.

میزبانی شده توسط پلتفرم ابری doprax.com

copyright © 2015-2023